European-language-classification | Challenge for startup.ml
kandi X-RAY | European-language-classification Summary
kandi X-RAY | European-language-classification Summary
European-language-classification is a Jupyter Notebook library. European-language-classification has no bugs, it has no vulnerabilities and it has low support. You can download it from GitHub.
Challenge for startup.ml. Find the challenges here. A writeup for the project can be found in the project writeup notebook. Classify 21 different European languages using the data given by the European Parliament Proceedings Parallel Corpus from 1996-2011. Scikit-learn is the main tool used here. The data is analyzed using n-grams, in particular, unigrams, bigrams and trigrams. We use a simple tfidf vectorizer combined with perceptron to create a classifier. Only the text from the month of January, over many years, is used in training and testing the data. The F-score was around 0.94 which was surprising. Then, the same algorithm was used to train on all the text from the month of January and tested against the following test set. The F-score in this case was around 0.89. Moving forward, I'd like to continue working on the project, optimizing the classifier with better preprocessing, more data and different algorithms.
Challenge for startup.ml. Find the challenges here. A writeup for the project can be found in the project writeup notebook. Classify 21 different European languages using the data given by the European Parliament Proceedings Parallel Corpus from 1996-2011. Scikit-learn is the main tool used here. The data is analyzed using n-grams, in particular, unigrams, bigrams and trigrams. We use a simple tfidf vectorizer combined with perceptron to create a classifier. Only the text from the month of January, over many years, is used in training and testing the data. The F-score was around 0.94 which was surprising. Then, the same algorithm was used to train on all the text from the month of January and tested against the following test set. The F-score in this case was around 0.89. Moving forward, I'd like to continue working on the project, optimizing the classifier with better preprocessing, more data and different algorithms.
Support
Quality
Security
License
Reuse
Support
European-language-classification has a low active ecosystem.
It has 0 star(s) with 1 fork(s). There are 2 watchers for this library.
It had no major release in the last 6 months.
European-language-classification has no issues reported. There are no pull requests.
It has a neutral sentiment in the developer community.
The latest version of European-language-classification is current.
Quality
European-language-classification has no bugs reported.
Security
European-language-classification has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
License
European-language-classification does not have a standard license declared.
Check the repository for any license declaration and review the terms closely.
Without a license, all rights are reserved, and you cannot use the library in your applications.
Reuse
European-language-classification releases are not available. You will need to build from source code and install.
Top functions reviewed by kandi - BETA
kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of European-language-classification
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of European-language-classification
European-language-classification Key Features
No Key Features are available at this moment for European-language-classification.
European-language-classification Examples and Code Snippets
No Code Snippets are available at this moment for European-language-classification.
Community Discussions
No Community Discussions are available at this moment for European-language-classification.Refer to stack overflow page for discussions.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install European-language-classification
You can download it from GitHub.
Support
For any new features, suggestions and bugs create an issue on GitHub.
If you have any questions check and ask questions on community page Stack Overflow .
Find more information at:
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page